Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUNTIME] Simple NDArray container API in c++ #1418

Merged
merged 2 commits into from
Jul 12, 2018
Merged

Conversation

tqchen
Copy link
Member

@tqchen tqchen commented Jul 11, 2018

Previously TVMArray simply allocates a DLTensor which is a plain C type. This PR change that into a NDArray, which is reference counted and can be used as container in c++.

The TVMArray now is backed by the C++ NDArray::Container, and can pass back and forth in PackedFunc. The python NDArray argument is now passed as NDArray type.

The callee can also directly take out DLTensor* when NDArray is passed, which is compatible with the previous behavior. Moreover, the API allows the callee to take out NDArray from TVMArgs and capture that in the function(by automatically increasing ref count). PackedFunc can now also return NDArray.

The NDArray::Container* is DLTensor* compatible, and can be directly viewed as DLTensor*.

Known Limitations

NDArray is a reference counted type and cannot pass RPC boundary. When NDArray is passed as an argument as a remote function, the argument is downgraded to DLTensor*, so passing NDArray as argument still work if the remote function modifies the data content. Remote function cannot return NDArray.

Copy link
Member

@jroesch jroesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks like a great change and will be very useful in Relay. Just put a couple clarifying questions on the PR.

* TVMSynchronize is necessary.
*/
inline void copyto(DLTensor* other);
inline void copyto(const NDArray& other);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not use Google style for these? Just want to clarify for example CopyTo or Copyto?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am debating this. One exception rule here, is that for user facing API, we try to make it as consistent as possible with numpy. Which means function like empty is used in here. But for CopyTo, I think CamelCase is also good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to CamelCase

*/
std::vector<int64_t> shape_;
/*! \brief The internal array object */
std::atomic<int> ref_counter_{0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect to share these across threads without synchronization? Atomic ref-counts are pretty expensive compared to non-atomic ones.

Copy link
Member Author

@tqchen tqchen Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need this for multi-threaded purposes. For most cases, overhead of atomic is fine compared to overhead of other things(computing, python and kernel launch)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

std::atomic<int> ref_counter_{0};
};

// implementations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve comment?

@@ -394,8 +409,9 @@ class TVMRetValue : public TVMPODValue_ {
using TVMPODValue_::operator int;
using TVMPODValue_::operator bool;
using TVMPODValue_::operator void*;
using TVMPODValue_::operator TVMArray*;
using TVMPODValue_::operator DLTensor*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we pass the underlying pointer instead of the container?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store the by pointer. But from user's perspective, they pass in containers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now, looking at Python code more carefully.

}
delete ptr;
}
// Deleter for NDArray converted from DLPack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to understand, this deleter is used from data which is passed from external DLPack arrays not allocated inside of TVM? Could we maybe improve comment here? we could borrow from text of PR, would probably help for future readers of the this code.

}
delete ptr;
}
// local create function that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we improve the comment to be a full sentences, I can do another pass later if you don't have cycles.

i.e "Local create function which allocates tensor metadata but does not allocate space for the data."

DLDataType dtype,
DLContext ctx);
/*!
* \brief Create a NDArray backed by dlpack tensor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a NDArray backed by a dlpack tensor.?

@tqchen
Copy link
Member Author

tqchen commented Jul 12, 2018

@jroesch I addressed all of your comments, can you take another look?

@jroesch
Copy link
Member

jroesch commented Jul 12, 2018

@tqchen seems good to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants